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Abstract 

The non isotropic noncentral elliptical shape distributions via pseudo-Wishart dis- 
tribution are founded. This way, the classical shape theory is extended to non 
isotropic case and the normality assumption is replaced by assuming a elliptical 
distribution. In several cases, the new shape distributions are easily computable 
and then the inference procedure can be studied under exact densities. An ap- 
plication in Biology is studied under the classical gaussian approach and two non 
gaussian models. 

1 Introduction 

With the introduction of several innovative statistical and mathematical tools for high- 
dimensional data analysis, now the classical multivariate analysis have a new and modern 
image. Developments as generalised multivariate analysis, latent variable analysis, DNA 
microarray data, pattern recognition, multivariate analysis nonlinear, data mining, manifold 
learning, shape theory, etc., open a range of potential applications in many areas of the 
knowledge. 

As consequence of these new statistical and mathematical tools a new theory can be 
considere from the conjunction between generalised multivariate analysis and the statistical 
shape theory is termed Generalised Shape Theory, in which the methodology developed in 
the shape theory under Gaussian models is extended to a general class of distributions, the 
elliptically contoured densities. 
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Having this goal in mind, recall that X : N x K has a matrix multivariate elliptically 
contoured distribution if its density with respect to the Lebesgue measure on $l NK is given 
by: 



/x(X) 
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i S |K/a|e|Jv/a 



ft {tr [(X - /ti)'5r x (X - n)®' 1 ] } 



where (i:NxK, £ : iV x TV, ® :KxK, S positive definite (S > 0), > 0. The function 
ft : 5ft — > [0, oo) is termed the generator function, and it is such t hat f °° u NK ~ 1 h(u 2 )du < 
oo. Such a distributio n is de noted by X ~ £ nxk(li>, S, 0, ft), see iFang and Zhang (1990) 
and iGupta and Vargal (|1993l) . Observe that this class of matrix multivariate distributions 
includes Gaussian, Contaminated Normal, Pearson type II and VI, Kotz, Jensen-Logistic, 
Power Exponential, Bessel, among other distributions; whose distributions have tails that 
are weighted more or less, and/or distributions with greater or smaller degree of kurtosis 
than the Gaussian model. 

Now, in shape theory context, it is known that the shape of an object is all geometrical 
information that remains after filtering out translation, rotation and scale information of 
an original figure (represented by a matrix X) comprised in N landmarks in K dimensions. 
Hence, we say that two figures, Xi : N x K and X 2 : N x K have the same shape if 
they are related with a special similarity transformation X 2 = /?XiH + In'J 1 , where H : 
K x K £ SO{K) = {H € 5R KxA "|HH' = H'H = l K and |H| = +1} (the rotation), 
7 : K X 1 (the translation), ljy : N X 1, ljv = (1, 1, . . . , 1)', and /? > (the scale). Thus, 
in this context, the shape of a matrix X is all the geomet rical information about X th at is 
invariant under Eucl i dean similarity transformations, see Goodall and Mardia I (|l993l ) and 



Drvden and Mardia 1 11998) 



In the classical statistical shape theory is assum ed that X has the isotropi c matrix 
multivariate Gaussian distribution with mean /x x , see iGoodall and Mardia I ([1993I ). i.e. 



X ~ 7Vat x k(/x x , <t 2 I n ® Ik)- 
In the context of the generalised shape theory, it is assumed that 



X-£Ar xX (At x ,S x ®0,ft). 
Thus, two fundamental extensions of classical shape theory are provided, namely: 

• The generalised theory assumes a matrix multivariate elliptical distribution for the 
landmark data instead of considering a matrix multivariate Gaussian distribution. 

• Also, the usual isotropic Gaussian condition is replaced by assuming a non isotropic 
elliptical model. Two important advantages are obtained: first, the errors are corre- 
lated among landmarks, this is considered with the introduction of S, a N x N definite 
positive matrix; and second, the errors are correlated among coordinates of landmarks, 
this condition is noticed with the introduction of 0, a K x K definite positive matrix. 

The shape coordinates denoted as u of X can be c onstructed by several ways in terms 
of QR dec omposition, see Goodall and Mardia (1 19931) ; and singular value decompo sition 



(SVD), see iGoodalll (Il99lh . iLe and Kendalil (Il993l) and IGoodall and Mardia I dl993h . For 



example, in terms of the QR decomposition, shape coordinates u of X are constructed in 
several steps summarised in the expression 



LX0 



-1/2 



LZ = Y = TH = rWH = rW(u)H, 



(1) 



Observe that fi z = A* x 1 ' 2 and the QR shape coordinates of fi z are defined analogously. 
The matrix L : (N — 1) x N has orthonormal rows to 1 = (1, ...,!)'. L can be a submatrix 
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of the Helmert matrix, for example. Now, let be n — min(A — 1, K) and p — rank/i. In ([T]), 
Y = TH is the QR decomposition, where T : (N — 1) x n is lower triangular with tu > 0, 
i = 1, . . . , min(n, X — 1), and H:nx K, H e V n ,K = {H G 9fi" xK |HH' = I„}, the Stiefel 
manifold. Note that T is invariant to translations and rotations of Z. The matrix T is 
referred as the QR size- and- shape and their elements are the QR size-and-shape coordinates 
of the original landmark data Z. Typically in shape analysis there are more landmarks 
than dimensions (N > K). H acts on the right to transform $t K instead of acting on the 
left as in the multivariate analysis. In our case we see the landmarks as variables and the 
dimensions as observations, then the transposes of our matrices Z and Y can be seen as 
classical multivariate data matrices. Now, if we divide T by its size, the centroid size of Z, 

r = 1 1 Til = Vtr T'T = IIYII. 



= 1, the elements of 
nK +n(n+ l)/2-l 



we obtain the so-termed QR shape matrix W in ([T]). Note that ||W 
W are a direction vector for shape, and u comprises m = (N — 1)K - 
generalised polar coordinates. 

Observe that, if 1 / 2 is the positive definite squ are root of the matrix 0, i 
(0 1/2 ) 2 , with 1 / 2 : K x K, iGupta and Vargal (|l993l . p. 11), and noting that 







X0-!X' = X(0 1 / 2 1 / 2 )- 1 X' = X0- 1 / 2 (X0- 1 /2)' = ZZ ' 5 



where 



then 



X0 



-1/2 



Z ~ £nxk{(^ z , E x ,Iff , h) 
Gupta and Vargal (|l993l . p. 20). 



with fi z = /x x -1 / 2 , see 

And we arrive at the classical starting point in shape theory where the original landmark 
matrix is replaced by Z = X0 -1 / 2 . Then we can proceed as usual, removing from Z, 
translation, scale, rotation and/or reflection in order to obtain the shape of Z (or X) via 
the QR decomposition, for example. 

Let be fi = Lfi , then Y : (JV — 1) x K is invariant to translations of the figure Z, and 



Y - £ w _i xK (/x0- 1/2 , S ® I K , h), 



where S = LS X L 7 . 



As suggest iGoodall and Mardia I (| 1993T) , the density of YY' essentially is the refection 
size-and-shape distribution of Y, moreover, it is invariant to orientation and reflection. Re- 
call that for a given Y : N— lxK, n = A^— 1 < K, then V = YY' has the noncentral Wishart 
distribution with respect to Lebesgue measure on the subspace of definite positive matrices 
V > 0. However, the density of V = YY' when, n > K, exist on the (nK - K(K - l)/2)- 
dimensional manifold of rank-K positive semidefinite N — 1 x N — 1 matric es with K dis - 
tinct positive eigenvalues, which is term ed pseudo- Wishart distribution, see luhligl (Il994j ). 



Diaz-Garcia and Gonzalez-Farias ( 20051) and Di'az-Garcia and Gutierrez- Jaimez ( 20061) . There- 



fore, alternatively to (fTJ we propose the following steeps for obtain the shape coordinates 



LX0~ 1/2 = LZ = Y^V = rW = rW(u), 



(2) 



where V = YY' and W = V/r, with r = ||V||. 

In this work the size and shape distribution for any elliptical model in terms of pseudo- 
Wishart distribution is derived in section [5] Then the shape density is obtained in section [3J 
The central case of the shape density is studied in section[4j and is established that the central 
QR reflection shape density is invariant under the elliptical family. Some particular shape 
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densities are derived in section [5] in order to perform inference on exact distributions; i.e. 
a subfamily of shape distributions generated by Kotz distributions including the Gaussian 
is obtained and applied. Finally in section [SJ two elements of that class (the Gaussian and 
a non Gaussian model) are applied to an existing publish data, the mouse vertebra study. 
Some test for detecting shape differences are gotten and the models are discriminated by 
the use of a dimension criterion such as the modified BIC* criterion. 



2 Pseudo-Wishart size-and-shape distribution 



Let V = YY'. In general (n — N — 1 < K or n > K), the matrix V can be written as 

with rank of Vn = n, 



Vn 

nxn 



v 12 

nx(iV— 

V2I V22 

(JV-l)-nXn (JV-l)-rax(iV— l)-n 



such that, the number of mathematically independent elements in V are m = (N — 1)K — 
nK + n{n + l)/2 corresponding to the mathematically independent elements in Vn > 
if n = N — 1 < K or to the mathematically independent elements of V12, and Vn > 
if n > K. Recall that Vn > 0, in such a way that Vn has n(n + l)/2 mathematically 
independent elements, therefore, 



(dV) 



(dVu) = f\ dvij, 

i<j 

n (N-l) 

(dV n) A (rfV 12) = f\ f\ dvij, if n>K 

i=l j=i 



]£n = N-l<K: 



Formally, the mea sure (dV) i s the Hausdorff measure defined on subspac e of po s itive semidef- 



initc matrices, sec 



Billi ngslevl (Il986l), Dfaz-Garcia and Gutierrez- Jaime3 ( 1997 ) , Diaz-Garcia et al 



!jl99?t ) and iDiaz-Garcia and Gonzalez- Farias ( 2005 ). 

Explicit forms for (dV) can be obtained under diverse factorisations of the measure (dV). 
For example, by using the Cholesky decomposition V = TT', where T : (N — 1) x n is lower 
triangular with ta > 0, i = 1, . . . , min(n, K — 1) 



(dV) = 2 n Y[tfr(dT). 



(3) 



Alternatively, under the nonsingular part of the spectral decompositions V = WjDWi, 
Wi € V„,at_i and D = diag(di, . . . , d n ), di > ■ ■ ■ > d n > , then 



(dV) = 2- n \D\ N - 1 - n Hid, - dj)(dD)(WidWi). 



(4) 



Alternative explicit form for (dV) are given in IDiaz-Garcia and Gonzalez- Farias I ([20051 ) ■ 
Theorem 2.1. The pseudo-Wishart size-and-shape density is 



T n [K/2]\H\k/* 



(5) 
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where (dV) is defined in ^ or (Q} (among many others), ft = X 1 /i,0 C K (B) 
are the zonal polynomials of B corresponding to the partition k — (t\,...t a ) of t, with 
X)f=i** = *! and ( a )« = lL=i( a - (J - i)/ 2 )*^ (°)t = a(a + l)---(a + t - 1), are the 
generalized hypergeometric coefficients and T s (a) = tt s ^ s ~ 1 ^ 4 Ylj—i T(a — (j — l)/2) is the 
multivariate Gamma function, see IJamesI (Il964h and IMuirheadl (Il982h . And is the 

j-th derivative of h with respect to v. The matrix V* is given as, 



V* = 



Vn, under Cholesky decomposition; 
D, under spectral decomposition. 



Proof. See iDfaz-Garcfa and Gonzalez-Farias I (l2005h . 

Observe that the density functions ([5]) with respect to corresponding Hausdorff measure 
([3]) or ((4]) are not unique, moreover, the Hausdorff measures ([3]) or ((4)) are also not unique; 
however, from a practical point of view, for example, the maximum likelihood estimation of 
the unknown parameters is invariant un der different ch oices of mea sures or (HJ and their 
corresponding density functions ©, see Khatri ( 19681 p. 275) and Rao! (|l973i p. 532). 



3 Pseudo-Wishart shape distribution 

Observe that forV : N—lxN—1, of rank n = min(7V— 1, K), hence the matrix V contains 
(N— 1)K — nK + n(n + l)/2 mathematically independent pseudo-Wishart coordinates (vij). 
Let vecw V a vector consisting of mathematically independent elements of V, taken column 
by column. Then the pseudo-Wishart shape matrix W can be written as 

vecwW = ivccwV, r = II VII = VtrV 2 = v/trfY'Y) 2 , 
r 

then bv IMuirheadl (|l982l Theorem 2.1.3, p.55), 

m / rn \ 

(dvecw V) = r rn [J sin" 1 " 4 t f\ dQi A dr, 

i=l \i=l ) 

with m= (N - l)K - nK + n(n + l)/2 - 1. Denoting u = (6q, . . . ,0 m )', (du) = A™i d ^ 
and J(u) = n™ i sin m_l 0;, with r > 0, < 0; < tt (i = 1, . . . , m - 1), < 6 m < 2tt, then 

(dV) = r m J(u)(du) A dr. 

Theorem 3.1. The pseudo-Wishart reflection shape density is 

^w(W) = V \KI^.\KI2 



Tn\Kl2\\Z\Kl* ^ t\(\K) K 

I r m-n(K~N)/2+t h (2t) ^ ^ w + ^ fi ] ^ ^ ^ 

J 



(6) 



where W* = V*/r. 
Proof. The density of V is 



dFv(v) = r n [K/2]\n*/' h±< Ok).. 1\ {dY) - 



t—Q k \2 J 
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Making the change of variables W(u) = V/r, the joint density function of r and u is 
„ , „ rx 7r ™*/2| r w*|( jr -*)/ a ^[trfrE^W + 



x -r m J(u)dr A (du). 

Now, note that 

• c^rns^w) = r*c«(nE- 1 w). 

• ^[trfrS^W + n)] =/i( 2 *)[rtrS- x W + trn]. 

Finally, collecting powers of r by r m +™( i:f - JV )/ 2 + t ; the marginal of W is obtained integrating 
with respect to r. □ 
When £ = ct 2 I, then ft = n&^fi'/a 2 , |£| K/2 = ct m , M = (N — 1)K andrtrS^W = 
rtrW/cr 2 , thus Theorem 13. II becomes. 

Corollary 3.1. The isotropic pseudo-Wishart reflection shape density is 

T .^| W ,,» ,,, 3j(u) g ^.(>w) 

/>OC 

x / r ™-»(*-")/2+V 2t >[rtrW/CT 2 + tr 0](dr) A (du). (7) 
j o 

4 Central case 

The central case of the preceding sections can be derived easily. 

Corollary 4.1. The central pseudo-Wishart reflection size- and- shape density is given by 

nK/2\ V *\{K-N)/2 

Proof. It is straightforward from Theorem [2J] just take /x = and recall that M°)[tr ■] 
/Mr- . □ 
Similarly: 

Corollary 4.2. The central pseudo-Wishart reflection shape density is given by 

Proof. Just take /x = and fr(°)[tr •] = h[tr •] in Theorem ED □ 
Observe that it is possible to obtain an invariant central shape density, i.e. the density 

does not depend on function h(-) Let h be the density generator of Y ~ £ N _i <K (0, I® I, h), 

i.e. 

/y(Y) - ft(tr YY'), 
then bv lFang and Zhang I (|l990L eq. 3.2.6, p.102), 
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Taking s = r 2 with dr = ds/ (2y/s) 



JN-\)K/2-\ 



h(s)dr 



T[(N-l)K/2] 



Hence, if s = (trS- 1 W)r, ds = (tr S^W^dr), then 



poo 

/ r m -"( x - JV )/ 2 /i[rtrS" 1 W](rfr) 
Jo 



m-n{K-N)/2 



h(s)- 



ds 



,„ yftrS^W); w (trS~ 1 W) 

= (trE-lw^-AO/S-m-l / s (2m-n(K-JV))/2+l-l /l(s)rfs 

Jo 

r tr ^-i^n»(K-jv)/2-m-i IV - "(^ - ^)/ 2 + 1] 



Thus: 



Corollary 4.3. When fi = the pseudo-Wishart reflection shape density is invariant under 
the elliptical family and it is given by 



dF w (W) 



^K- m+n( K-N),2-l T[m n{R N)/2 ± ij (K _ N)/2 

r„[x/2]|E|^/ 2 1 1 

xJ(u)(trS- 1 W)"( K - Ar )/ 2 - m - 1 ((iu). 



Now, if S = cr 2 I, then 

(tr^W)"^ - ^ 2- " 1-1 = (i/ cr 2 )™(^-^)/ 2 -™-i(t r W)" (K " Ar)/2 " m " 1 , 
and |S|^/ 2 = (a 2 ) M / 2 , thus: 

Corollary 4.4. When /j, = and X = er 2 I i/ie pseudo-Wishart reflection shape density is 
invariant under the elliptical family and it is given by 

dF , w) _ ^K- m+n{ K- N)/ 2-. T[m _ n[R N)/2 ij /2 

w[ ' ~ 2T n [K/2}(<r 2 ) n ( K - N )/ 2 + M / 2 - m - 1 1 1 

( trW )n(X-iV)/2-m-l x J( u )( du ). 



5 Some particular models 

Finally, we give explicit shapes densities for some elliptical models. 
The Kotz type I model is given by 





■i+ K( rr 


'if(JV-l)" 
2 




n K(N- 


l)/2p 




1)1 



y exp{-i?j/}, 



d fe [2/ T " 1 exp{-i?y}] 
lhen, the corresponding fc-th derivative — r , is 



k..T-l 



{-Rfy 

cxp{Ry} 



m— 1 



dy k 

m—l 



i=0 



{-RyY 



(8) 



(9) 
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It includes the Gaussian case, i.e. when T = 1 and R = 1/2, here the derivation is 
straightforward from the general density. 

The required derivative follows easily, it is, 

nM/2 

h {k \y) = —{-R) k eM-Ry) 



and 



f 

Jo 



m- n (K-N)/2+t,(2t) 



/i (2t) [rtrS _1 W + trn]dr 



_ 7r -M/2^-m+t+i(-2+M+n(K-iV))^ r j,-l^ys-l-m-t+n(X-JV)/2 



1 + m + t + 2 n i- K + N ) 



x etr(-Rto)T 
So, we have proved that 

Corollary 5.1. The Kotz type I (T = 1 ) Pseudo-Wishart reflection shape density is 

n (nK-M)/2\ W *\(K-N)/2j( u j etr 



dFw(W) 



Rm -^-2+M+n(K-N)) Vn | S |jf/ 2 

^r[l + m + t + n(-if + iV)/2] C^MIE^W) 
X t!(tr W) 1 + m + t_ ™( Ar --' v )/ 2 ^ 



A" 



w/iere M = (N - l)K. 



Finally, for the Kotz type I model (JSJ and the given 2t-th derivative, we can prove easily 
that 

Corollary 5.2. The pseudo-Wishart reflection shape density based on the Kotz type I model 
is given by 



dFw(W) 



T„ [K/2] |£|*/ 2 



EE' 



-7(W(u),r) (du) (10) 



where 



J(W(u), r) = I 
lo 



G e- RB A~ a - x 



"(*- w )/ 2 +V 2t) [ r tr $r X W + tr fi](dr)(du) 

u-l 

Y J {u\)' 1 R 2t - 1 - a - u B T ' 1 - u Y[l + a + u] Yl (T - 1 - s) 



E 



'it 



n(T-i 



i=0 
u-l 



(u!) (-1) i? B 



<r[l + a + u] (T - 1 - v - s) 



with M = (N - 1)K, G = tt- m / 2 R t - 1+m ' 2 T[M /2}/Y[T - 1 + M /2], 4 = trE^W, 
S = tr n and a = m- n(K -N)/2 + t. 

This density seems uncomputable but it easy to see that it has the form of a generalised 
hypergeometric functions (see next sectio n). These series can be de termined by suitable 
modifications of the algorithms given by iKoev and Edelman I ([20061 ) for qFi and at the 
same computational costs. Moreover, if the parameter T > is an integer, the series are 
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simplified substantially. For example, we can prove that the shape density associated to a 
Kotz model with T = 3, R = 1/2 and the isotropic assumption (S = <j 2 Yn-i and = Ik), 
is given by: 



dFw(W) = 



7r (nK-M)/2 |w » |( y-iv)/2 J(u) etr^y^/g^) 

2 _ 3 - m+ (M+n(if-iV))/2 M ( M + 2 )r„ [Jf/2] 

2i) 2 - 2t]r [a] + 2(B - 2t)r [a + 1] + T [a + 2] 



(11) 



t l ff Af-2-2m+»(K-iV) (trW)° 



where M = (N - l)K, B = trfi'fi/2a 2 and a = l + m + t + n{-K + N)/2. 

Other examples shall be considered in the next section, when T = 1 and T — 2. More 
complex densities in the co n text o f affine shape theory were computed by using the same 
idea, see Caro-Lopera et al. ( 20091 ). 



6 Example 



This problem is s tudied in detail by Dryden and Mardia ( 1998() under a number of ap- 
proaches (see also iMardia and Dryden I (|l989i )). The experiment considers the second tho- 
racic vertebra T2 of two groups of mice: large and small. The mice are selected and classified 
according to large or small body weight, respec tively; in this case, the samp le consists of 23 
large and small bones (the data can be found in lDrvden and Mardia I (|1998l p. 313-316)). It 
is of interest to study shape differences between the two groups. The vertebras are digitised 
and summarised in six mathematical landmarks which are placed at points of high curva- 
ture, see figure [T] they are symmet rically selected by measurin g the extreme positive and 
negative curvature of the bone. See iDrvden and Mardia I (|1998l ) for more details. 




6 <M 

Figure 1: Mouse vertebra 



Here we study three models, the Gaussian shape, and two shape Kotz type I models with 
T = 2 and T = 3. 

First, the isotropic Gaussian shape density is obtained from corollary 15.11 when we set 
R = |, S = a 2 I N -!, & = I K ,Q = XrVerV = <j- 2 hh', namely 
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Corollary 6.1. The Pseudo-Wishart reflection shape density based on the isotropic Gaus- 
sian is given by 



dFw(W) = 



^(nK-Af)/2 |w * |( if-iV)/2 J(u)etr (_^ /2a 2) 
2-m+(-2+M+n(K-N))/2-p n [K/2\ 

r[l + rn + t + n(-K + N)/2] ^(^/t'W/t) 



w/iere M = (N — 1)K. 

A second shape distribution that we will use follows from corollary |5.2l bv taking R = 1/2, 
T = 2, i.e. 

7r («^-M)/2| W *|(^-JV)/2 J(u) etr (_ /i ' /i / 2(J 2) 



2 _2-m+(Af+ n (K-iV))/2j WTn [^/ 2 ] 



\ - (B - 2t)T [a] + r [a + 1] ^ ^j/Wji) 

X 4- t ! (7 M-2-2 m +n(Jf-JV) ( t rW) a ^ 

where M = (N - 1)K, B = tr///x/2er 2 and a = 1 + m + t + n{-K + N)/2. 

And the third shape model of this example corresponds to the isotropic Kotz distribution 
with T = 3 and R = 1/2, see (fTTj) . 

In order to select the best elliptical model, a number of dimension criteri a have been pro- 
posed . We shall consider a modification of the BIC* sta tistic as discussed in I Yang and Yang 



( 2007 ). and which was first achieved bv iRissanenl ( 1978 ) in a coding theory framework. The 



modified BIC* is given by: 

BIC* = -2£(p,, a 2 ,h) + n p (log(n + 2) - log 24), 

where £(/i, cr 2 , h) is the maximum of the log- likelihood function, n is the sample size and n p 
is the number of parameters to be estimated for each particular shape density. 

Now, if the goal of the shape analysis searches the best elliptical distribution, among a set 
of proposed models, the modified BIC* criterion suggests to choose the model for which the 
modif ied B IC* receives it s smallest value. In addition, as proposed bv iKass and Rafterv 



(Il995h and lRaftervl ljl995h . ;he following selection criteria have been employed in order to 



compare two contiguous models in terms of its corresponding modified BIC* . 

Table 1: Grades of evidence corresponding to values of the BIC* difference. 



BIC* difference 


Evidence 


0-2 


Weak 


2-6 


Positive 


6-10 


Strong 


> 10 


Very strong 



Now, recall that for a general density generator h(-) 

X ~ £AT X x(At x ,S x ® &,h), 

where fx = Lp x , then 

Y ~ £Ar_ lxi f(/x0" 1/2 , S ® I K ,h), 
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( Mn 


M12 \ 


M21 


M22 


M31 


M32 


M41 


M42 


V M51 


M52 / 



with £ = LS X L'. 

In the mouse vertebra experiment, we want to find the maximum likelihood estimators 
(MLE) of the mean shape 



A 4 



and the scale parameter er 2 defined in the isotropy assumption & = Ik and S = er 2 Ijv_i, 
(in order to accelerate the computations of this example we fix the variance of the process 
as 50 -the maximum median variance of the two samples-). This optimisation is applied in 
the two independent populations, the small and large groups; first by assuming a Gaussian 
model and afterwards by considering two Kotz models indexed by T = 2 and T = 3. 

The general procedure is the following: Let £(fl,a 2 , h) be the log likelihood function of 
a given group- model. The maximisation of the likelihood function £(/x, cr 2 , h), is obtained in 
this paper by using the N elder-Mead Simplex Method, which is an unconstrained multivari- 
ate function using a derivative-free method; specifically, we apply the routine fminsearch 
implemented by the sofware MatLab. 

As the reader can check, the shape densities are series of zonal polynomials of the form 



£ 

t=0 



/(t,trX)^C K (X) 



t! 



0)k 



(12) 



which has hypergeometric series 



t=o 



C K (X) 
(«)« 



as a pa rticular case; these series were non computable for decades. The work of lKoev and Edelman 
(2006) solved the problem and it let the computation of the hypergeometric series by trun- 
cation of the series until the coefficient for large degrees are zero under certain tolerance. 
The cited algorithm gives the coefficients of the series, then, we can modified the algorithm 
for hypergeometric series to compute the shape densities with the same computational costs, 
multiplying each coefficient of the series by the required function f(t, trX). 



At this point the log likelihood can be computed, then we use fminsearch for the MLE's. 
The initial value for the algorithm is the sample mean of the elliptical matrix variables 
Y ~ £/v_i x y(^6~^ 2 , Sg>Ift- , h). However, we need to deal with an open problem proposed 
bv lKoev and Edelman I ( 20061 ). the relationship between the convergence and the truncation 
of the series. Concretely, how many terms we need to consider in the series (I12[) in order 
to reach some fixed tolerance for convergence. A numerical solution consists of optimising 
the log likelihood, by increasing the truncation until, the MLE's and the maximum of the 
function, reach an equilibrium, which depends on the standard accuracy and tolerance of the 
routine fminsearch. We tried the truncations 20, 40, 60, 80, 100, 110, 120, 140 and 160, and we 
note that after the truncation 120 the solutions stabilise, the maximum likelihood estimators 
for location parameters associated with the small and large groups under the Gaussian, Kotz 
T = 2 and Kotz T = 3 models, are summarized in tables [2H respectively. Tables also show 
the modified BIC* value, the number of iterations for obtaining the convergence and the 
time in seconds for each optimisation. 
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Table 2: The maximum likelihood estimators for the small group under the Gaussian model 



Trunc. 


Mn 


fJ-12 


M21 


fJ-22 


M31 


M32 


fJ-41 


20 


30.40 


-8.13 


5.73 


9.47 


4.01 


17.34 


-2.70 


40 


-0.47 


-44.69 


15.04 


-4.54 


25.27 


0.60 


4.88 


60 


-2.10 


-54.84 


18.31 


-6.09 


31.03 


-0.12 


6.17 


80 


-0.70 


-63.48 


21.37 


-6.46 


35.89 


0.83 


6.94 


100 


-2.61 


-71.03 


23.73 


-7.85 


40.19 


-0.10 


7.98 


110 


-0.54 


-74.58 


25.13 


-7.50 


42.16 


1.14 


8.12 


120 


-3.41 


-77.44 


25.88 


-8.71 


43.94 


-0.37 


8.79 


140 


-3.41 


-77.44 


25.88 


-8.71 


43.94 


-0.37 


8.79 


160 


-3.41 


-77.44 


25.88 


-8.71 


43.94 


-0.37 


8.79 



Trunc. 


M42 


M51 


M52 


BIG* 


Time 


Iter. 


20 


4.24 


-6.82 


-20.65 


-3538.26 


317 


4103 


40 


5.21 


-30.81 


2.11 


-4155.34 


281 


1881 


60 


6.23 


-37.75 


3.63 


-4659.16 


417 


1923 


80 


7.40 


-43.77 


3.01 


-5110.98 


426 


1455 


100 


8.08 


-48.90 


4.63 


-5532.79 


742 


2025 


110 


8.72 


-51.43 


3.34 


-5735.95 


607 


1507 


120 


8.76 


-53.43 


5.37 


-5914.74 


721 


1640 


140 


8.76 


-53.43 


5.37 


-5914.74 


721 


1640 


160 


8.76 


-53.43 


5.37 


-5914.74 


721 


1640 



Table 3: The maximum likelihood estimators for the small group under the Kotz T = 2 
model 



Trunc. 


Mil 


M12 


M21 


M22 


M31 


M32 


M41 


20 


-6.06 


-32.06 


10.23 


-5.19 


18.24 


-2.80 


4.18 


40 


4.42 


-46.00 


15.97 


-3.02 


25.91 


3.39 


4.45 


60 


-0.53 


-56.62 


19.07 


-5.73 


32.01 


0.80 


6.18 


80 


-1.88 


-65.36 


21.89 


-7.04 


36.97 


0.20 


7.28 


100 


-0.04 


-73.09 


24.68 


-7.18 


41.31 


1.40 


7.90 


110 


-1.62 


-76.63 


25.72 


-8.06 


43.34 


0.57 


8.47 


120 


-1.84 


-79.60 


26.76 


-8.39 


45.13 


0.56 


8.83 


140 


-1.84 


-79.60 


26.76 


-8.39 


45.13 


0.56 


8.83 


160 


-1.84 


-79.60 


26.76 


-8.39 


45.13 


0.56 


8.83 



Trunc. 


M42 


M51 


M52 


BIG* 


Time 


Iter. 


20 


3.12 


-21.88 


5.46 


-3584.58 


311 


1957 


40 


5.89 


-31.91 


-1.22 


-4203.67 


627 


2052 


60 


6.61 


-39.04 


2.62 


-4709.27 


890 


1986 


80 


7.49 


-45.02 


3.90 


-5162.67 


951 


1566 


100 


8.60 


-50.43 


2.94 


-5585.92 


1468 


1978 


110 


8.84 


-52.81 


4.17 


-5789.74 


1160 


1386 


120 


9.18 


-54.98 


4.37 


-5969.11 


1464 


1656 


140 


9.18 


-54.98 


4.37 


-5969.11 


1464 


1656 


160 


9.18 


-54.98 


4.37 


-5969.11 


1464 


1656 



The computations were performed with a processor Intel(R) Corel(TM)2 Duo CPU, 
E7400@2.80GHz, and 2,96GB of RAM. 



12 



Table 4: The maximum likelihood estimators for the small group under the Kotz T = 3 
model 



Trunc. 


A«ii 


M12 


M21 


A«22 


fJ-31 


M32 


M41 


20 


-2.37 


-33.66 


11.14 


-4.10 


19.07 


-0.68 


3.91 


40 


-10.75 


-46.42 


14.62 


-8.18 


26.44 


-5.17 


6.28 


60 


-0.29 


-58.24 


19.64 


-5.81 


32.92 


0.97 


6.32 


80 


-1.69 


-67.11 


22.50 


-7.15 


37.96 


0.35 


7.45 


100 


-1.31 


-74.91 


25.17 


-7.79 


42.36 


0.72 


8.24 


110 


-1.33 


-78.51 


26.38 


-8.15 


44.39 


0.77 


8.64 


120 


-1.75 


-81.49 


27.42 


-8.54 


46.20 


0.65 


9.03 


140 


-1.75 


-81.49 


27.42 


-8.54 


46.20 


0.65 


9.03 


160 


-1.75 


-81.49 


27.42 


-8.54 


46.20 


0.65 


9.03 



Trunc. 


M42 


fJ.51 


M52 


BIC* 


Time 


Iter. 


20 


3.71 


-23.13 


2.97 


-3625.80 


101 


2083 


40 


4.30 


-31.60 


9.27 


-4247.02 


185 


2067 


60 


6.82 


-40.17 


2.52 


-4754.54 


273 


2050 


80 


7.72 


-46.23 


3.84 


-5209.68 


322 


1816 


100 


8.68 


-51.63 


3.88 


-5634.52 


329 


1449 


110 


9.10 


-54.11 


4.05 


-5839.09 


440 


1776 


120 


9.41 


-56.30 


4.38 


-6019.10 


461 


1688 


140 


9.41 


-56.30 


4.38 


-6019.10 


461 


1688 


160 


9.41 


-56.30 


4.38 


-6019.10 


461 


1688 



Table 5: The maximum likelihood estimators for the large group under the Gaussian model 



Trunc. 


jttn 


A*12 


A*21 


p.22 


Jj.31 


M32 


ji/41 


20 


-19.04 


-22.88 


5.37 


-8.26 


15.82 


-10.82 


3.84 


40 


-29.85 


-29.93 


6.54 


-12.38 


20.99 


-17.32 


5.62 


60 


-15.90 


-49.41 


14.07 


-9.87 


32.64 


-7.20 


5.30 


80 


-41.34 


-43.55 


9.73 


-17.35 


30.42 


-23.86 


7.93 


100 


-66.69 


-8.40 


-3.88 


-21.92 


9.43 


-42.24 


8.53 


110 


-40.91 


-57.46 


14.17 


-18.57 


39.32 


-22.74 


8.83 


120 


-32.30 


-65.98 


17.67 


-16.67 


44.17 


-16.68 


8.38 


140 


-32.30 


-65.98 


17.67 


-16.67 


44.17 


-16.68 


8.38 


160 


-32.30 


-65.98 


17.67 


-16.67 


44.17 


-16.68 


8.38 



Trunc. 


M42 


M51 


M52 


BIC* 


Time 


Iter. 


20 


1.42 


-18.06 


15.31 


-3540.51 


155 


2075 


40 


1.52 


-23.59 


23.97 


-4159.47 


259 


1824 


60 


4.80 


-39.19 


13.02 


-4665.18 


274 


1295 


80 


2.35 


-34.35 


33.21 


-5118.90 


300 


1044 


100 


-3.59 


-6.20 


53.12 


-5542.62 


978 


2753 


110 


4.04 


-45.42 


32.97 


-5746.72 


449 


1143 


120 


5.65 


-52.16 


26.15 


-5926.64 


509 


1172 


140 


5.65 


-52.16 


26.15 


-5926.64 


509 


1172 


160 


5.65 


-52.16 


26.15 


-5926.64 


509 


1172 



Figures [2] show the behavior of the maximum of the log likelihood when the number of 
iterations is increased. In this case we use a truncation of 160, and again, we note that the 
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Table 6: The maximum likelihood estimators for the large group under the Kotz T = 2 
model 



Trunc. 


Mn 


M12 


fJ.21 


M22 


M31 


M32 


M41 


20 


-21.67 


-21.97 


4.83 


-9.01 


15.40 


-12.56 


4.10 


40 


-36.15 


-24.57 


4.23 


-13.84 


17.94 


-21.68 


6.00 


60 


-32.77 


-42.35 


10.19 


-14.52 


29.14 


-18.44 


6.82 


80 


-31.52 


-53.20 


13.74 


-15.19 


36.02 


-16.98 


7.42 


100 


-31.91 


-61.32 


16.28 


-16.10 


41.25 


-16.74 


8.02 


110 


-38.06 


-61.70 


15.79 


-18.09 


41.86 


-20.66 


8.78 


120 


-42.11 


-62.61 


15.65 


-19.44 


42.61 


-23.16 


9.32 


140 


-42.11 


-62.61 


15.65 


-19.44 


42.61 


-23.16 


9.32 


160 


-42.11 


-62.61 


15.65 


-19.44 


42.61 


-23.16 


9.32 



Trunc. 


M42 


M51 


M52 


BIC* 


Time 


Iter. 


20 


1.13 


-17.32 


17.40 


-3586.83 


339 


2165 


40 


0.44 


-19.28 


28.94 


-4207.80 


476 


1594 


60 


2.80 


-33.46 


26.38 


-4715.29 


519 


1133 


80 


4.18 


-42.10 


25.47 


-5170.59 


706 


1144 


100 


5.12 


-48.56 


25.83 


-5595.75 


1143 


1528 


110 


4.74 


-48.81 


30.73 


-5800.52 


1105 


1349 


120 


4.58 


-49.41 


33.92 


-5981.01 


1320 


1459 


140 


4.58 


-49.41 


33.92 


-5981.01 


1320 


1459 


160 


4.58 


-49.41 


33.92 


-5981.01 


1320 


1459 



Table 7: The maximum likelihood estimators for the large group under the Kotz T = 3 
model 



Trunc. 


fj.ii 


H12 


(J.21 


/122 


jU31 


^32 


H41 


20 


-31.74 


3.30 


-4.16 


-9.72 


-0.19 


-20.55 


3.56 


40 


-41.25 


-18.15 


1.70 


-14.83 


14.14 


-25.34 


6.17 


60 


-24.01 


-49.58 


13.33 


-12.45 


33.24 


-12.39 


6.27 


80 


-32.14 


-54.75 


14.17 


-15.53 


37.05 


-17.29 


7.60 


100 


-41.95 


-57.11 


13.96 


-18.87 


39.15 


-23.43 


8.93 


110 


-35.24 


-65.37 


17.23 


-17.55 


44.04 


-18.63 


8.70 


120 


-39.44 


-66.43 


17.11 


-18.97 


44.89 


-21.22 


9.27 


140 


-39.44 


-66.43 


17.11 


-18.97 


44.89 


-21.22 


9.27 


160 


-39.44 


-66.43 


17.11 


-18.97 


44.89 


-21.22 


9.27 



Trunc. 


fJ.42 


Mbi 


M52 


BIC* 


Time 


Iter. 


20 


-2.58 


2.86 


25.23 


-3628.05 


73 


1494 


40 


-0.67 


-14.14 


32.95 


-4251.15 


125 


1411 


60 


4.26 


-39.27 


19.47 


-4760.57 


163 


1239 


80 


4.32 


-43.32 


25.97 


-5217.61 


197 


1110 


100 


3.93 


-45.13 


33.79 


-5644.35 


299 


1345 


110 


5.37 


-51.75 


28.52 


-5849.87 


304 


1246 


120 


5.21 


-52.46 


31.83 


-6031.00 


387 


1457 


140 


5.21 


-52.46 


31.83 


-6031.00 


387 


1457 


160 


5.21 


-52.46 


31.83 


-6031.00 


387 


1457 



log likelihood is bounded for a very small number of iterations in each particular model. 
According to the modified BIC criterion, we can order the models in the large and small 
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Figure 2: Behaviors of Log-Likelihood functions in terms of the iteration number of the 
fminscarch routine. 



groups as follows: (1) Kotz T = 3, (2) Kotz T = 2, (3) Gaussian. 

This order can be seen in figure [3J which compares the log- likelihood of the two groups 
under the three models in terms of the algorithm iteration when the truncation is set in 160. 

Modified BIC* of both groups shows a very strong difference (see table []} between the 
best model (1) and the classical Gaussian (3). 

In both cases, the true models of the data maybe have tails that are weighted more or 
less than Gaussian model or that the shape distribution present grater or smaller degree of 
kurtosis than the Gaussian model. 

Remark 6.1. We have used this example from the literature to illustrate the generalised 
shape theory; moreover, based on the modified BIC* , we found that the Kotz distribution 
(with T = 3) is the best model in this experiment. However, suppose the expert in the 
area of application knows that the landmarks have a Gaussian distribution, then we must 
apply the classical theory of shape (based on normality). Alternatively, if the expert in the 
application area suspects that the landmarks do not have a Gaussian distribution, so we can 
apply the generalized theory proposed here. In this case the expert has the necessary tools 
to choose an elliptical model (as an alternative to the Gaussian distribution), according to 
the characteristic of the sample which reveal and/or support a non Gaussian distribution, 
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Figure 3: Comparison of Log-Likelihood in all models and groups in terms of the iteration 
number of the fminsearch routine. 



i.e. to select a distribution with more or less heavy tails, or more or less kurtosis than the 
Gaussian density; among many others possible characteristics. 

Once the best models are selected for the small and large groups, we can test equality 
in mean shape between the two independent populations. In this experiment we have: 
two independent samples of 23 bones and 10 population shape parameters to estimate for 
each group. Namely, if L(fi s ,fi l ) is the likelihood, where n s , represent the mean shape 
parameters of the small and large group, respectively, then we want to test: H : H s = Hi vs 
H a : fi s ^ Hi- Then —2 log A = 2sup ffi logL(/i s , Hi) — 2 sup Ho logL(/n s , Hi), and according 
to Wilk's theorem —2 log A ~ Xio under Hq. 

Using fminsearch with a truncation of 160 we obtained that: 

-2 log A = 2(3999.1273)- 2(3990.3601) = 17.5344, 

this is the same result when the series were truncated at 120 and 140. Since the p- value for 
the test is 

P(Xw > 17.5344) = 0.0633 
we have some evidence that the small and large mouse vertebrae are different in mean shape. 



Mardia and Drvden I (119891) studied this problem with a Gaussian model and Bookstein 



coordinates (see also Drvden and Mardia I |l998)) and they obtained for the same test an 



approximate p-value of zero (P(xs ^ 127.75). Our test also rejects the equality of mean 
shape based on a better non Gaussian model but without an strong evidence as the Gaussian 
model suggests. 

Note that the MLE's given by tables[2]|4]correspond to the matrix fj, in Y ~ £n-ixk(h, S$ 
Ik, h), we can use this information and the transformations 

LX0~ 1/2 = LZ = Y^V = rW = rW(u), 

(with V = YY and W = V/r) to estimate the different means at each step, i.e.: the 
original elliptical mean /x x , the size-and-shape mean /x v and the shape mean /x w . 

This example deserves a detailed study about some important facts, i.e. the distribution 
of —2 log A for small samples, the truncation of the series, global optimisation methods, etc. 
These problems shall be considered in a subsequent work. 
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A final comment, for any elliptical model we can obtain the SVD reflection model, 
however a nontrivial problem appears, the 2t-th derivative of the generator model, which 
can be seen as a partition theory problem. For the general case of a Kotz model (s 7^ 1), and 
another models as P earson II and VII, Bessel, Jensen-logistic, we can use formulae for these 
derivatives given by ICaro-Lopera et all ( 2009h . The resulting densities have again a form 
of a generalised series of zonal polynomials which can be computed efficiently after some 
modification of existing works for hypergeometric series, see lKoev and Edelman (2006), thus 
the inference over an exact density can be performed, avoiding the use of a ny asymptoti c 
distribution, and the initial transformation avoids the invariant polynomials of Davis ( 1980T ). 
which at present are not computable for large degrees. 
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